Application-Specific Schema Design for Storing Large RDF Datasets
نویسندگان
چکیده
In order to realize the vision of the Semantic Web, a semantic model for encoding content in the World Wide Web, efficient storage and retrieval of large RDF data sets is required. A common technique for storing RDF data (graphs) is to use a single relational database table, a triple store, for the graph. However, we believe a single triple store cannot scale for the needs of large-scale applications. Instead, database schemas that can be customized for a particular dataset or application are required. To enable this, some RDF systems offer the ability to store RDF graphs across multiple tables. However, tools are needed to assist users in developing application-specific schema. In this paper, we describe our approach to developing RDF storage schema and describe two tools assisting in schema development. The first is a synthetic data generator that generates large RDF graphs consistent with an underlying ontology and using data distributions and relationships specified by a user. The second tool mines an RDF graph or an RDF query log for frequently occurring patterns. Knowledge of these patterns can be applied to schema design or caching strategies to improve performance. The tools are being developed as part of the Jena Semantic Web programmers’ toolkit but they are generic and can be used with other RDF stores. Preliminary results with these tools on real data sets are also presented.
منابع مشابه
Supporting Scalable, Persistent Semantic Web Applications
To realize the vision of the Semantic Web, efficient storage and retrieval of large RDF data sets is required. A common technique for persisting RDF data (graphs) is to use a single relational database table, a triple store. But, we believe a single triple store cannot scale for large-scale applications. This paper describes storing and querying persistent RDF graphs in Jena, a Semantic Web pro...
متن کاملGenerating RDF for Application Testing
Application testing is a critical component of application development. Testing of Semantic Web applications requires large RDF datasets, conforming to an expected form or schema, and preferably, to an expected data distribution. Finding such datasets often proves impossible, while generating input datasets is often cumbersome. The GRR (Generating Random RDF) system is a convenient, yet powerfu...
متن کاملSesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
RDF and RDF Schema are two W3C standards aimed at enriching the Web with machine-processable semantic data. We have developed Sesame, an architecture for efficient storage and expressive querying of large quantities of metadata in RDF and RDF Schema. Sesame’s design and implementation are independent from any specific storage device. Thus, Sesame can be deployed on top of a variety of storage d...
متن کاملRDF-3X: a RISC-style engine for RDF
RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper p...
متن کاملMaterializing and Persisting Inferred and Uncertain Knowledge in RDF Datasets
As the semantic web grows in popularity and enters the mainstream of computer technology, RDF (Resource Description Framework) datasets are becoming larger and more complex. Advanced semantic web ontologies, especially in medicine and science, are developing. As more complex ontologies are developed, there is a growing need for efficient queries that handle inference. In areas such as research,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003